NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Clustering and visualization of single-cell RNA-seq data using path metrics

https://doi.org/10.1371/journal.pcbi.1012014

Manousidaki, Andriana; Little, Anna; Xie, Yuying (May 2024, PLOS Computational Biology)
Zhang, Shihua (Ed.)
Recent advances in single-cell technologies have enabled high-resolution characterization of tissue and cancer compositions. Although numerous tools for dimension reduction and clustering are available for single-cell data analyses, these methods often fail to simultaneously preserve local cluster structure and global data geometry. To address these challenges, we developed a novel analyses framework,Single-CellPathMetricsProfiling (scPMP), using power-weighted path metrics, which measure distances between cells in a data-driven way. Unlike Euclidean distance and other commonly used distance metrics, path metrics are density sensitive and respect the underlying data geometry. By combining path metrics with multidimensional scaling, a low dimensional embedding of the data is obtained which preserves both the global data geometry and cluster structure. We evaluate the method both for clustering quality and geometric fidelity, and it outperforms current scRNAseq clustering algorithms on a wide range of benchmarking data sets.
more » « less
Full Text Available
Fermat Distances: Metric Approximation, Spectral Convergence, and Clustering Algorithms

Garcia_Trillos, Nicolas; Little, Anna; McKenzie, Daniel; Murphy, James M (July 2024, Journal of machine learning research)

Full Text Available
Fermat Distances: Metric Approximation, Spectral Convergence, and Clustering Algorithms

Garcia_Trillos, Nicolas; Little, Anna; McKenzie, Daniel; Murphy, James M (June 2024, Journal of machine learning research)

We analyze the convergence properties of Fermat distances, a family of density-driven metrics defined on Riemannian manifolds with an associated probability measure. Fermat distances may be defined either on discrete samples from the underlying measure, in which case they are random, or in the continuum setting, where they are induced by geodesics under a density-distorted Riemannian metric. We prove that discrete, sample-based Fermat distances converge to their continuum analogues in small neighborhoods with a precise rate that depends on the intrinsic dimensionality of the data and the parameter governing the extent of density weighting in Fermat distances. This is done by leveraging novel geometric and statistical arguments in percolation theory that allow for non-uniform densities and curved domains. Our results are then used to prove that discrete graph Laplacians based on discrete, sample-driven Fermat distances converge to corresponding continuum operators. In particular, we show the discrete eigenvalues and eigenvectors converge to their continuum analogues at a dimension-dependent rate, which allows us to interpret the efficacy of discrete spectral clustering using Fermat distances in terms of the resulting continuum limit. The perspective afforded by our discrete-to-continuum Fermat distance analysis leads to new clustering algorithms for data and related insights into efficient computations associated to density-driven spectral clustering. Our theoretical analysis is supported with numerical simulations and experiments on synthetic and real image data.
more » « less
Full Text Available
Fermat Distances: Metric Approximation, Spectral Convergence, and Clustering Algorithms

García_Trillos, Nicolás; Little, Anna; McKenzie, Daniel; Murphy, James (June 2024, Journal of machine learning research)

We analyze the convergence properties of Fermat distances, a family of density-driven metrics defined on Riemannian manifolds with an associated probability measure. Fermat distances may be defined either on discrete samples from the underlying measure, in which case they are random, or in the continuum setting, where they are induced by geodesics under a density-distorted Riemannian metric. We prove that discrete, sample-based Fermat distances converge to their continuum analogues in small neighborhoods with a precise rate that depends on the intrinsic dimensionality of the data and the parameter governing the extent of density weighting in Fermat distances. This is done by leveraging novel geometric and statistical arguments in percolation theory that allow for non-uniform densities and curved domains. Our results are then used to prove that discrete graph Laplacians based on discrete, sample-driven Fermat distances converge to corresponding continuum operators. In particular, we show the discrete eigenvalues and eigenvectors converge to their continuum analogues at a dimension-dependent rate, which allows us to interpret the eﬃcacy of discrete spectral clustering using Fermat distances in terms of the resulting continuum limit. The perspective aﬀorded by our discrete-to-continuum Fermat distance analysis leads to new clustering algorithms for data and related insights into eﬃcient computations associated to density-driven spectral clustering. Our theoretical analysis is supported with numerical simulations and experiments on synthetic and real image data.
more » « less
Full Text Available
Linear Distance Metric Learning with Noisy Labels

Alishahi, Meysam; Little, Anna; Phillips, Jeff M (April 2024, Journal of machine learning research)

Full Text Available
Linear Distance Metric Learning with Noisy Labels

Alishahi, Meysam; Little, Anna; Phillips, Jeff M (April 2024, Journal of machine learning research)
Kontorovich, Aryeh (Ed.)
In linear distance metric learning, we are given data in one Euclidean metric space and the goal is to find an appropriate linear map to another Euclidean metric space which respects certain distance conditions as much as possible. In this paper, we formalize a simple and elegant method which reduces to a general continuous convex loss optimization problem, and for different noise models we derive the corresponding loss functions. We show that even if the data is noisy, the ground truth linear metric can be learned with any precision provided access to enough samples, and we provide a corresponding sample complexity bound. Moreover, we present an effective way to truncate the learned model to a low-rank model that can provably maintain the accuracy in the loss function and in parameters – the first such results of this type. Several experimental observations on synthetic and real data sets support and inform our theoretical results.
more » « less
Full Text Available
Linear Distance Metric Learning with Noisy Labels

Alishahi, Meysam; Little, Anna; Phillips, Jeff M (April 2024, Journal of machine learning research)

In linear distance metric learning, we are given data in one Euclidean metric space and the goal is to find an appropriate linear map to another Euclidean metric space which respects certain distance conditions as much as possible. In this paper, we formalize a simple and elegant method which reduces to a general continuous convex loss optimization problem, and for different noise models we derive the corresponding loss functions. We show that even if the data is noisy, the ground truth linear metric can be learned with any precision provided access to enough samples, and we provide a corresponding sample complexity bound. Moreover, we present an effective way to truncate the learned model to a low-rank model that can provably maintain the accuracy in the loss function and in parameters -- the first such results of this type. Several experimental observations on synthetic and real data sets support and inform our theoretical results.
more » « less
Full Text Available
On generalizations of the nonwindowed scattering transform

https://doi.org/10.1016/j.acha.2023.101597

Chua, Albert; Hirn, Matthew; Little, Anna (January 2024, Applied and Computational Harmonic Analysis)
Balan, Radu (Ed.)
In this paper, we generalize finite depth wavelet scattering transforms, which we formulate as L^q(R^n) norms of a cascade of continuous wavelet transforms (or dyadic wavelet transforms) and contractive nonlinearities. We then provide norms for these operators, prove that these operators are well-defined, and are Lipschitz continuous to the action of C^2 diffeomorphisms in specific cases. Lastly, we extend our results to formulate an operator invariant to the action of rotations and an operator that is equivariant to the action of rotations.
more » « less
Full Text Available
Bispectrum Unbiasing for Dilation-Invariant Multi-Reference Alignment

https://doi.org/10.1109/TSP.2024.3420930

Yin, Liping; Little, Anna; Hirn, Matthew (January 2024, IEEE Transactions on Signal Processing)

Full Text Available
Linear Distance Metric Learning with Noisy Labels

Alishahi, Meysam; Little, Anna; Phillips, Jeff M (January 2024, Journal of Machine Learning Research)

In linear distance metric learning, we are given data in one Euclidean metric space and the goal is to find an appropriate linear map to another Euclidean metric space which respects certain distance conditions as much as possible. In this paper, we formalize a simple and elegant method which reduces to a general continuous convex loss optimization problem, and for different noise models we derive the corresponding loss functions. We show that even if the data is noisy, the ground truth linear metric can be learned with any precision provided access to enough samples, and we provide a corresponding sample complexity bound. Moreover, we present an effective way to truncate the learned model to a low-rank model that can provably maintain the accuracy in the loss function and in parameters – the first such results of this type. Several experimental observations on synthetic and real data sets support and inform our theoretical results.
more » « less
Full Text Available

« Prev Next »

Search for: All records